* Table 1: Main results

* Open raw voting and registration data for Table 1
import delimited "ky_demos.csv", clear

* Label variables (for reference: they will be reprocessed)
label var year "Year of election"
label var code "County, numbered alphabetically per Kentucky Secretary of State"
label var agegroup "Age: 1 = Under 25; 2 = 25-34; 3 = 35-49; 4 = 50-61; 5 = 62 & Over"
label var gender "Sex: f = Female; M = male"
label var totalreg "Total number of registrants in age-gender-county-year"
label var totalvot "Total number of voters in age-gender-county-year"
label var demreg "Number of registered Democrats in age-gender-county-year"
label var demvot "Number of Democratic-registered voters in age-gender-county-year"
label var repreg "Number of registered Republicans in age-gender-county-year"
label var repvot "Number of Republican-registered voters in age-gender-county-year"
label var otherreg "Number of other registrants in age-gender-county-year"
label var othervot "Number of non-major-registered voters in age-gender-county-year"
label var timezone "Time Zone: 0 = Eastern Time; 1 = Central Time"
label var congdist "Congressional District: 1 = 1st; 2 = 2nd; 5 = 5th, 6 = 6th"
label var expansion "Flag for counties near but not bordering time-zone border"


* Aggregate data by gender and party groups
collapse timezone congdist expansion (sum) totalreg totalvot, by(year code agegroup)

* Create variable marking age-county to panel-correct standard errors
gen panelvar = agegroup + 10*code

* Create variable marking county-year for clustering of standard errors
gen countyyear = year + 10000*code

* Set up panel by county-age group (cross-section) and year (time-series)
xtset panelvar year

* Generate turnout (as percentage of registered voters) from raw data
gen totalpct = 100*totalvot/totalreg

* Generate turnout (as proportion of registered voters) for GLM
gen totalprop = totalpct/100

* Create independent variable of interest, interacting age with time zone
gen interact = agegroup*timezone

* Label variables
label var totalreg "Total number of registrants in age-county-year"
label var totalvot "Total number of voters in age-county-year"
label var timezone "Time Zone: 0 = Eastern Time; 1 = Central Time"
label var congdist "Congressional District: 1 = 1st; 2 = 2nd; 5 = 5th"
label var panelvar "Code for age group (last digit) and county (earlier digits)"
label var countyyear "Code for year (last four digits) and county (earlier digits)"
label var totalpct "Percent of age-county registered voting in election year"
label var totalprop "Proportion of age-county registered voting in election year"
label var interact "Interaction of age group and time zone (agegroup*timezone)"
label var expansion "Flag for counties near but not bordering time-zone border"

* Estimate model for Table 1's Column I 
xtpcse totalpct timezone agegroup interact congdist##year if expansion == 1

* Estimate model for Table 1's Column II
xtpcse totalpct timezone##agegroup congdist##year if expansion == 1

* Estimate marginal effects of age vs. time-zone interaction for Figure 2 (should correspond to the first ten, "Column II" rows of fig2data.csv)
margins timezone#agegroup

* Estimate model for Table 1's Column III
glm totalprop timezone agegroup interact congdist##year if expansion == 1, cl(countyyear) family(binomial)

* Estimate model for Table 1's Column IV
glm totalprop timezone##agegroup congdist##year if expansion == 1, cl(countyyear) family(binomial)

* Estimate marginal effects of age vs. time-zone interaction for Figure 2 (should correspond to the last ten, "Column IV" rows of fig2data.csv)
margins timezone#agegroup



* Appendix: Analysis for extreme age groups separately

xtpcse totalpct timezone congdist##year if agegroup == 1 & expansion == 1
xtpcse totalpct timezone congdist##year if agegroup == 5 & expansion == 1

glm totalprop timezone congdist##year if agegroup == 1 & expansion == 1, cl(countyyear) family(binomial)
glm totalprop timezone congdist##year if agegroup == 5 & expansion == 1, cl(countyyear) family(binomial)



* Appendix: Analysis only for (non-)presidential election years

xtpcse totalpct timezone agegroup interact congdist##year if mod(year, 4) == 0 & expansion == 1
xtpcse totalpct timezone agegroup interact congdist##year if mod(year, 4) != 0 & expansion == 1

xtpcse totalpct timezone##agegroup congdist##year if mod(year, 4) == 0 & expansion == 1
xtpcse totalpct timezone##agegroup congdist##year if mod(year, 4) != 0 & expansion == 1

glm totalprop timezone agegroup interact congdist##year if mod(year, 4) == 0 & expansion == 1, cl(countyyear) family(binomial) 
glm totalprop timezone agegroup interact congdist##year if mod(year, 4) != 0 & expansion == 1, cl(countyyear) family(binomial) 

glm totalprop timezone##agegroup congdist##year if mod(year, 4) == 0 & expansion == 1, cl(countyyear) family(binomial) 
glm totalprop timezone##agegroup congdist##year if mod(year, 4) != 0 & expansion == 1, cl(countyyear) family(binomial) 


* Appendix: Table 1 replicated including counties near but not touching time-zone border

xtpcse totalpct timezone agegroup interact congdist##year
xtpcse totalpct timezone##agegroup congdist##year
glm totalprop timezone agegroup interact congdist##year, cl(countyyear) family(binomial)
glm totalprop timezone##agegroup congdist##year, cl(countyyear) family(binomial)



* Appendix: Table 1 replicated using turnout as a proportion of voting-age population rather than of registered voters

* Note that "votevap" derives from American Community Survey 5-year data that provides estimates in the form of percentages of the county's total population.  Thus estimated populations are typically not integers.

import delimited "ky_demos.csv", clear
collapse timezone congdist expansion (sum) totalvap totalvot, by(year code agegroup)
gen panelvar = agegroup + 10*code
gen countyyear = year + 10000*code
xtset panelvar year

* Generate turnout (as percentage of estimated total population in age-sex category) from raw data
gen totalpct = 100*totalvot/totalvap

* Generate turnout (as proportion of estimated total population in age-sex category) for GLM
gen totalprop = totalpct/100

gen interact = agegroup*timezone

xtpcse totalpct timezone agegroup interact congdist##year if expansion == 1
xtpcse totalpct timezone##agegroup congdist##year if expansion == 1
glm totalprop timezone agegroup interact congdist##year if expansion == 1, cl(countyyear) family(binomial)
glm totalprop timezone##agegroup congdist##year if expansion == 1, cl(countyyear) family(binomial)



* Appendix: Analysis for placebo test

* Open raw voting and registration data for placebo test
import delimited "ky_placebo.csv", clear

* Label variables (for reference: they will be reprocessed)
label var year "Year of election"
label var code "County, numbered alphabetically per Kentucky Secretary of State"
label var agegroup "Age: 1 = Under 25; 2 = 25-34; 3 = 35-49; 4 = 50-61; 5 = 62 & Over"
label var female "Sex: 1 = Female; 0 = male"
label var register "Total number of registrants in age-gender-party-county-year"
label var voted "Total number of voters in age-gender-party-county-year"
label var part "Party of registration: d = Democratic, r = Republican, o = Other"
label var timezone "Time Zone: 0 = Eastern Time; 1 = Central Time"
label var congdist "Congressional District: 1 = 1st; 2 = 2nd; 5 = 5th"

* Aggregate data by gender 
collapse timezone congdist (sum) register voted, by(year code agegroup part)

* Create numeric party codes
encode part, gen(partyno)

* Create variable marking age-county-party to panel-correct standard errors
gen panelvar = partyno + 10*agegroup + 100*code

* Generate turnout as percentage of registered voters
gen turnout = 100*voted/register

* Label variables
label var timezone "Time Zone: 0 = Eastern Time; 1 = Central Time"
label var congdist "Congressional District: 1 = 1st; 2 = 2nd; 5 = 5th"
label var panelvar "Code for party (last digit), age (penultimate digit), county (earlier digits)"
label var register "Total number of registrants in age-party-county-year"
label var voted "Total number of voters in age-party-county-year"
label var turnout "Percent of age-county-party registered voting in election year"

* Set up panel by county-age-party (cross-section) and year (time-series)
xtset panelvar year

* Estimate placebo-test model
xtpcse turnout timezone##agegroup##partyno congdist##year

* Generate data for figure (should correspond to the data in figappdata.csv)
margins timezone#agegroup#partyno, pwcompare



* Table 2: County-level analysis with control variables, estimating difference between turnout among older and younger voters

* Import raw data for county-level analysis
import delimited "ky_county.csv", clear

* Deskew population density by logarithmic transformation
gen lnpopdns = ln(popsqkm)

* Rescale household income so regression effect sizes are more readable and comparable to other variables'
replace hhinc2010 = hhinc2010/1000

* Label variables
label var year "Year of election"
label var code "County, numbered alphabetically per Kentucky Secretary of State"
label var county "County name"
label var agegap1 "Percentage-point difference in turnout between under-25 and over-61 age groups"
label var agegap2 "Percentage-point difference in turnout between under-35 and over-49 age groups"
label var under25 "Percentage of population under age 25"
label var over65 "Percentage of population aged 65 or over"
label var totblk "Percentage of population that is Black (Hispanic or non-)"
label var tothisp "Percentage of population that is Hispanic (all races)"
label var unemp "County unemployment rate (source: Bureau of Labor Statistics)"
label var hhinc2010 "Median household income, deflated to 2010 dollars using Consumer Price Index"
label var popsqkm "Population density per square kilometer of land area"
label var timezone "Time Zone: 0 = Eastern Time; 1 = Central Time"
label var congdist "Congressional District: 1 = 1st; 2 = 2nd; 5 = 5th"
label var lnpopdns "Natural logarithm of population density: ln(popsqkm)"

* Set up panel by county (cross-section) and year (time-series)
xtset code year

* Estimate model for Table 2's Column I
xtpcse agegap1 timezone under25 over65 totblk tothisp lnpopdns unemp hhinc2010 congdist##year

* Estimate model for Table 2's Column II
xtpcse agegap2 timezone under25 over65 totblk tothisp lnpopdns unemp hhinc2010 congdist##year



* Appendix: Falsification analysis

xtreg under25 timezone, be
xtreg over65 timezone, be
xtreg totblk timezone, be
xtreg tothisp timezone, be
xtreg lnpopdns timezone, be
xtreg unemp timezone, be
xtreg hhinc2010 timezone, be
